Reward-Weighted Regression Converges to a Global Optimum

نویسندگان

چکیده

Reward-Weighted Regression (RWR) belongs to a family of widely known iterative Reinforcement Learning algorithms based on the Expectation-Maximization framework. In this family, learning at each iteration consists sampling batch trajectories using current policy and fitting new maximize return-weighted log-likelihood actions. Although RWR is yield monotonic improvement under certain circumstances, whether which conditions converges optimal have remained open questions. paper, we provide for first time proof that global optimum when no function approximation used, in general compact setting. Furthermore, simpler case with finite state action spaces prove R-linear convergence state-value optimum.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Collective Learning Generally Overcomes Local Optima and Converges to the Global Optimum

Local minima represent a major problem for neural network learning procedures. In this article we present a new procedure, collective learning, that leads to improved global convergence. We have tested our procedure on several neural networks and on the multimodal functions proposed by De Jong and Rastrigin. In our tests we have reached a success ratio of 100 %. In addition we give a few remark...

متن کامل

Episodic Reinforcement Learning by Logistic Reward-Weighted Regression

It has been a long-standing goal in the adaptive control community to reduce the generically difficult, general reinforcement learning (RL) problem to simpler problems solvable by supervised learning. While this approach is today’s standard for value function-based methods, fewer approaches are known that apply similar reductions to policy search methods. Recently, it has been shown that immedi...

متن کامل

A modification to geographically weighted regression

BACKGROUND Geographically weighted regression (GWR) is a modelling technique designed to deal with spatial non-stationarity, e.g., the mean values vary by locations. It has been widely used as a visualization tool to explore the patterns of spatial data. However, the GWR tends to produce unsmooth surfaces when the mean parameters have considerable variations, partly due to that all parameter es...

متن کامل

A WEIGHTED LINEAR REGRESSION MODEL FOR IMPERCISE RESPONSE

A weighted linear regression model with impercise response and p-real explanatory variables is analyzed. The LR fuzzy random variable is introduced and a metric is suggested for coping with this kind of variables. A least square solution for estimating the parameters of the model is derived. The result are illustrated by the means of some case studies.

متن کامل

On rigorous upper bounds to a global optimum

In branch and bound algorithms in constrained global optimization, a sharp upper bound on the global optimum is important for the overall efficiency of the branch and bound process. Software to find local optimizers, using floating point arithmetic, often computes an approximately feasible point close to an actual global optimizer. Not mathematically rigorous algorithms can simply evaluate the ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i8.20811